String Matching with Variable Length Gaps
نویسندگان
چکیده
We consider string matching with variable length gaps. Given a string T and a pattern P consisting of strings separated by variable length gaps (arbitrary strings of length in a specified range), the problem is to find all ending positions of substrings in T that match P . This problem is a basic primitive in computational biology applications. Let m and n be the lengths of P and T , respectively, and let k be the number of strings in P . We present a new algorithm achieving time O((n+m) log k+α) and space O(m+A), where A is the sum of the lower bounds of the lengths of the gaps in P and α is the total number of occurrences of the strings in P within T . Compared to the previous results this bound essentially achieves the best known time and space complexities simultaneously. Consequently, our algorithm obtains the best known bounds for almost all combinations of m, n, k, A, and α. Our algorithm is surprisingly simple and straightforward to implement.
منابع مشابه
Approximate String Matching with Variable Length Don ' t Care
Searching for DNA or amino acid sequences similar to a given pattern string is very important in molecular biology. In fact, a lot of programs and algorithms have been developed. Most of them are based on alignment of strings or approximate string matching. However, they do not seem to be adequate in some cases. For example, the DNA pattern TATA (known as TATA box) is a common promoter that oft...
متن کاملApproximate String Matching with Gaps
In this paper we consider several new versions of approximate string matching with gaps. The main characteristic of these new versions is the existence of gaps in the matching of a given pattern in a text. Algorithms are sketched for each version and their time and space complexity is stated. The specific versions of approximate string matching have various applications in computerized music an...
متن کاملVGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams
Many applications need to solve the following problem of approximate string matching: from a collection of strings, how to find those similar to a given string, or the strings in another (possibly the same) collection of strings? Many algorithms are developed using fixed-length grams, which are substrings of a string used as signatures to identify similar strings. In this paper we develop a nov...
متن کاملString Kernels Based on Variable-Length-Don't-Care Patterns
We propose a new string kernel based on variable-lengthdon’t-care patterns (VLDC patterns). A VLDC pattern is an element of (Σ∪{⋆})∗, where Σ is an alphabet and ⋆ is the variable-length-don’t-care symbol that matches any string in Σ∗. The number of VLDC patterns matching a given string s of length n is O(2). We present an O(n) algorithm for computing the kernel value. We also propose variations...
متن کاملPalindromic Decompositions with Gaps and Errors
Identifying palindromes in sequences has been an interesting line of research in combinatorics on words and also in computational biology, after the discovery of the relation of palindromes in the DNA sequence with the HIV virus. Efficient algorithms for the factorization of sequences into palindromes and maximal palindromes have been devised in recent years. We extend these studies by allowing...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010